RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator
نویسندگان
چکیده
The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the centre and shape of a high dimensional data set. It consists of determining a subsample of h points out of nwhichminimises the generalised variance. By definition, the computation of this estimator gives rise to a combinatorial optimisation problem, forwhich several approximate algorithms have been developed. Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the objective function. Recently, in a general framework, an approach transforming any discrete and high dimensional combinatorial problem of this type into a continuous and low-dimensional one has been developed and a general algorithm to solve the transformed problem has been designed. The idea is to build on that general algorithm in order to take into account particular features of the MCD methodology. More specifically, two main goals are considered: (a) adaptation of the algorithm to the specific MCD target function and (b) comparison of this ‘tuned’ algorithm with the usual competitors for computing MCD. The adaptation focuses on the design of ‘clever’ starting points in order to systematically investigate the search domain. Accordingly, a new and surprisingly efficient procedure based on a suitably equivariant modification of the well-known k-means algorithm is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulationswith FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others, yielding insight into their overall good performance. © 2009 Elsevier B.V. All rights reserved.
منابع مشابه
Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کاملRobustified distance based fuzzy membership function for support vector machine classification
Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the im...
متن کاملInnuence Function and Eeciency of the Minimum Covariance Determinant Scatter Matrix Estimator
The Minimum Covariance Determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric distribution. It is relatively fast to compute and intuitively appealing. In this note we derive its innuence function and compute the asymptotic variances of its elements. A comparison with the one step reweighted MCD and with S-estimators ...
متن کاملNonsingular Robust Covariance Estimation in Multivariate Outlier Detection
Rousseeuw’s minimum covariance determinant (MCD) method is a highly robust estimator of multivariate mean and covariance. In practice, the MCD covariance estimator may be singular. However, a nonsingular covariance estimator is required to calculate the Mahalanobis distance. In order to fix this singular problem, we propose an improved version of the MCD estimator, which is a combination of the...
متن کاملHighly Robust Estimation of Dispersion Matrices
In this paper, we propose a new componentwise estimator of a dispersion matrix, based on a highly robust estimator of scale. The key idea is the elimination of a location estimator in the dispersion estimation procedure. The robustness properties are studied by means of the influence function and the breakdown point. Further characteristics such as asymptotic variance and efficiency are also an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 54 شماره
صفحات -
تاریخ انتشار 2010